(Date: November 9, 2011) 



On the Stability of Bootstrap 
Estimators 

A. Christmann and M. Salibian-Barrera and S. Van Aelst 

^ University of Bayreuth, Department of Mathematics, Bayreuth, GERMANY, 
e-mail: cmdreas . christmannOuni-bayreuth . de 

^University of British Columbia, Department of Statistics, Vancouver, CANADA. 

e-mail: matias@stat.ubc.ca 

^ University of Ghent, Department of Applied Mathematics and Computer Science, 

Ghent, BELGIUM, 
e-mail: Stefan.VajiAelstSUGent.be 

Abstract: It is shown that bootstrap approximations of an estimator 
which is based on a continuous operator from the set of Borel probabihty 
measures defined on a compact metric space into a complete separable 
metric space is stable in the sense of qualitative robustness. Support 
vector machines based on shifted loss functions are treated as special 
cases. 

Keywords and phrases: bootstrap, statistical machine learning, sta- 
bility, support vector machine, robustness. 

1. Introduction 

The finite sample distribution of many nonparametric metliods from statis- 
tical learning theory is unknown because the distribution P from which the 
data were generated is unknown and because there are often only asymptot- 
ical results on the behaviour of such methods known. 

The goal of this paper is to show that bootstrap approximations of an 
estimator which is based on a continuous operator from the set of Borel 
probability distributions defined on a compact metric space into a complete 
separable metric space is stable in the sense of qualitative robustness. As a 
special case it is shown that bootstrap approximations for the support vector 
machine (SVM) are stable, both for the risk functional and for the SVM 
operator itself. The results can be interpreted as generalizations of theorems 
derived by [4]. 

The rest of the paper has the following structure. Section 2 gives the general 
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result and Section 3 contains the results for SVMs. All proofs are given in 
the appendix. 

2. On Qualitative Robustness of Bootstrap Estimators 

If not otherwise mentioned, we will use the Borel a-algebra B{A) on a set A 
and denote the Borel a-algebra on R by B. 

Assumption 1. Let {Q,A,fi) be a probability space, where f^i is unknown, 
{Z, dz) be a compact metric space, and B{Z) be the Borel a-algebra on Z. De- 
note the set of all Borel probability measures on {Z, B{Z)) by Aii{Z, B{Z)). 
On M.i[Z,B{Z)) we use the Borel a-algebra B{J^i{Z,B{Z))) and the bounded 
Lipschitz metric c/bl? see (4-11) ■ Let S be a statistical operator defined on 
Aii{Z, B{Z)) with values in a complete, separable metric space {W,dy^) en- 
clipped with its Borel a-algebra BiW). Let Z,Zn : {fl,A,fi) — > {Z,B{Z)), 
77, G IN, &e independent and identically distributed random variables and de- 
note the image measure by P := Z o ^. Let Sn{Zi, . . . , Zn) be a statistic 
with values in (W, B{W)). Denote the empirical measure of (Zi, . . . , Zn) by 
Pn '■= ^ J2^=i ^Zi- The statistic Sn is defined via the operator 

S : {Mi{Z, B{Z)), B{Mi{Z, B{Z))) ^ (W, B{W)) 

where S'(P.„) = Sn{Zi, . . . , Zn) ■ Denote the distribution of Sn{Zi, . . . , Zn) 
when Zi P by Sjn{S] P) := 2,{Sn{Zi, . . . , Zn))- Accordingly, we denote the 
distribution of Sn{Zi, . . . , Z„) when Zi P„ by 2.n{S; P„). 

Efron [9, 10] proposed the bootstrap, whose main idea is to approximate 
the unknown distribution £,n{S] P) by S^niS; P„). Note that these bootstrap 
approximations 2n{S] Pn) are (probability measure- valued) random variables 
with values in MiiW, B{W)). 

Following [4] we call a sequence of bootstrap approximations £„(5';Pn) 
qualitatively robust at P G Aii{Z, B{Z)) if the sequence of transformations 

gn : Mi{Z, B{Z)) ^ B{W)), gn{Q) = i^{i^n{S; Q„)), n e IN, 

(2.1) 

is asymptotically equicontinuous at P G J^i{Z, B{Z)), i.e. if 
Ve>035>03noGlN: 

rfBL(Q,P) <S ^ sup dBL{^iSlniS-qn)),^i^n{S;Pn))) < B. (2-2) 

n>no 
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Following [4] again, we call a sequence of statistics {Sn)n<m uniformly quali- 
tatively robust in a neighborhood U{Po) o/Pq G Aii{Z, B{Z)) if 

3no e IN Ve > Vn > no 35 > VP e W(Po) : 

The following two results and Theorem 8 in the next section are the main 
results of this paper. 

Theorem 2. // Assumption 1 is valid and if S is uniformly continuous in 
a neighborhood U{Po) of Pq G M.i{Z,B{Z)), then (S'„(Zi, . . . , is 
uniformly qualitatively robust in V((Pq). 

Theorem 3. If Assumption 1 is valid and if{Sn{Zi, . . . , Z„))„g]N is uniformly 
qualitatively robust in a neighborhood U{Po) of Po G A4i{Z, B{Z)) , then the 
sequence ii„(S';Pn) of bootstrap approximations of £,n{S;P) is qualitatively 
robust for Pq. 

As an immediate consequence from both theorems given above we obtain 

Corollary 4. // Assumption 1 is valid and if S is a continuous operator, 
then the sequence 2^n{,S; P„) of bootstrap approximations of 2,n{S] P) is qual- 
itatively robust for all P G Aii{Z, B{Z)). 

Remark 5. The Theorems 2 and 3 can be considered as a generalization 
of [4, Thm. 2, Thm. 3], who considered the case W := A C R being a fi- 
nite interval and Z := H-valued random variables Zi, . . . , Z„. In our case, 
the statistics S'„(Zi, . . . , Zn) are W -valued statistics, where W is a complete 
separable metric space and its dimension can be infinite. 

3. On Qualitative Robustness of Bootstrap SVMs 

In this section we will apply the previous results to support vector machines 
which belong to the modern class of statistical machine learning methods. 
I.e., we will consider the special case that W is a reproducing kernel Hilbert 
space H used by a support vector machine (SVM). Note that H typically 
has an infinite dimension, which is true, e.g., if the popular Gaussian RBF 
kernel k : X x X ^ H, k{x,x') := exp(— 7||x — x'W^) for 7 > 0) is used. 

To state our result on the stability of bootstrap SVMs in Theorem 8 below, 
we need the following assumptions on the loss function and the kernel. 
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Assumption 6. Let Z = X x y be a compact metric space with metric dz, 
where y G M, is closed. Let L : X x y x R ^ [0, oo) be a loss function such 
that L is continuous and convex with respect to its third argument and that 
L is uniformly Lipschitz continuous with respect to its third argument with 
uniform Lipschitz constant \L\i > 0, i.e. \L\i is the smallest constant c such 
that sup^^^ y^izxxy \L{x, y, t) — L{x, y, t')\ < c\t — t'\ for all t, t' e R. Denote the 
shifted loss function by L*{x, y, t) := L{x, y, t) — L{x, y, 0), {x,y,t) G X xy x 
R. Let k : X X X ^ R be a continuous kernel with reproducing kernel Hilbert 
space H and assume that k is bounded by \\k\\oo ■= {sup^^^^ k{x,x)y^'^ G 
(0,oo). Let X G (0,oo). 

These assumptions can be considered as standard assumptions for stable 
SVMs, see, e.g., [1] and [15, Chap. 10], . 

In this paper the RKHS H, the penalyzing constant A, and the loss function 
L and thus the shifted loss function L* are fixed. Therefore, we write in the 
next definition just 5* and R instead of Sl*,h,x and Rl*,h,\ to shorten the 
notation. 

Definition 7. The SVM operator S : Mi{Z,B{Z)) ^ H is defined by 

5(P) := /L^P,A := argminEpr(X,r,/(X)) + A ||/||^. (3.4) 

The SVM risk functional R : Mi{Z,B{Z)) ^ R is defined by 

R{P) :=EpL*(X,y,5(P)(X)) = EpL^(X,F,/i*,p,,(X)). (3.5) 

If Assumption 6 is valid, then S is well-defined because 5'(P) G H exists 
and is unique, R is well-defined because R(P) G R exists and is unique, and 
it holds, for all P G Ali(A' x 3^), 

||5(P)|U<^|i^|i||A;||L<oo and |i?(P)| < ||A;||^ < oo , (3.6) 

see [2, Thm5, Thm.6, (17), (18)]. 

Theorem 8. // the general Assumption 1 and the Assumption 6 are valid, 
then the SVM operator S and the SVM risk functional R fulfill: 

(i) The sequence -C„(5';P„) of bootstrap SVM estimators of £,n{S;P) is 

qualitatively robust for all P G A4i{Z, B{Z)). 
(a) The sequence il„(i?;P.„) of bootstrap SVM risk estimators o/£„(i?;P) 
is qualitatively robust for all P G M.i{Z,B{Z)). 
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4. Proofs 

4.1. Proofs of the results in Section 2 

For the proofs we need Theorem 9 and Theorem 10, see below. To state The- 
orem 9 on uniform Ghvenko-CanteUi classes, we need the following notation. 
For any metric space (5, d) and real- valued function / : 5 — )■ R, we denote 
the bounded Lipschitz norm of / by 

||/||bl:=sup|/(x)|+ sup \Mf:l^. (4.7) 

Let F be a set of measurable functions from {S,B{S)) — > (R,B). For any 
function G : F — )■ R (such as a signed measure) define 

||G||^:=sup{|G(/)|:/GF}. (4.8) 

Theorem 9. [8, Prop. 12] For any separable metric space {S, d) and M e 
(0,00), 

:= {/ : (5,S(5)) ^ (R,S); < M} (4.9) 

is a universal Glivenko-C antelli class. It is a uniform Glivenko-Cantelli class, 
i.e., for all e > 0, 

lim sup Pr* ( sup ||z/m — z^ll^ , > =0, (4-10) 

if and only if{S, d) is totally bounded. Here, Pr* denotes the outer probability. 

Note that the term ||z/m — i^W^^j in (4.10) equals the bounded Lipschitz 
metric c/bl of the probability measures Um and u if M = 1, i.e. 



Ikm - I^llj-j = sup |(l^m - I^)(/)| = sup fdUm- fdu 

/GFi /;II/IIbl<i J J 

(4.11) 

see [7, p. 394]. Hence, Theorem 9 can be interpreted as a generalization of 
[4, Lemma 1, p. 186], which says that if A C R is a finite interval, then 
dBhiP-m, P) converges almost surely to uniformly in P G Aii{A, B{A)). For 
various characterizations of Glivenko-Cantelli classes, we refer to [16, Thm. 
22] and [6]. 

We next list the other main result we need for the proof of Theorem 8. 
This result is an analogon of the famous Strassen theorem for the bounded 
Lipschitz metric c^bl instead of the Prohorov metric. 
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Theorem 10. [13, Thm. 4-2, P- 30] Let Z he a Polish space with topology 
Tz- Let (isL be the hounded Lipschitz metric defined on the set Aii{Z, B{Z)) 
of all Borel prohahility measures on Z. Then the following two statements 
are equivalent: 

(i) There are random variahles with distrihution Ui and ^2 with distri- 

hution U2 such that E[c/bl(^17 ^2)] < 
(a) dBL{i^i,i^2) < e. 

Proof of Theorem 2. We closely follow the proof by [4, Thm. 2] . However, 
we use Theorem 9 instead of their Lemma 1 and we use [3, Lem. 1] instead 
of [12, Lem. 1]. 

Let Vn C J^i{Z,B{Z)) be the set of empirical distributions of order n G IN, 
i.e. 

1 " 

Vn ■■= {Pn e Mi{Z,B{Z))- 3 (^1, . . . , ^0 e 2" such that Pn = - J^^.J , 

i=l 

(4.12) 

and let £n C Vn- If misunderstandings are unlikely, we identify £n with the 
set {2:1, . . . , Zn} of atoms. 
It is enough to show that 

V£>03(5>0VPe W(Po) 3 sequence {8n)nm C Vn (4.13) 

such that P"(£^„) > 1 — 5 and for all Q„ G £n and for all Q„ G Vn we have 

rfBL(Qn, Qn) < 5 ^ dw(5(Q„), 5(Q„)) < S. (4.14) 

From this we obtain that {Sn)nm is uniformly qualitatively robust by [3, 
Lem. 1]. 

Let £ > 0. Since the operator S is uniformly continuous in W(Po) we obtain 

3(5o >0 VP gW(Po) : rfBL(P,Q)<5o ^ d^^{Siy),S{q)) <e/2. 

(4.15) 

Hence by Theorem 9 for the special case M = 1 and by (4.11), we get 

3^0 G IN: sup Pr*(sup dBL(Pn,,P) < <^o) > 1 (4.16) 

For n > Uq and P G W(Po), define 

Sn,P := {Q„ e Vn : rfBL(Qn, P) < So/2} . (4.17) 
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It follows, that P"(£^„^p) > 1 — e together with Q„ G £n,v and (iBL(Qn, Qn) < 
(5o/2 implies that 

rfBL(Qn, P) < 5o/2 and rfBL(Qn, P) < • 

The triangle inequality thus yields due to (4.15) 

ciw(5(Qn), 5(Q„)) < c/w(5(Qn), 5(P)) + d^{S{V), 5(Q„)) < (4.18) 

from which the assertion follows. ■ 

Proof of Theorem 3. The proof mimics the proof of [4, Thm. 3], but uses 
Theorem 9 instead of [4, Lem. 1]. 

Fix Po G A4.i{Z,B{Z)) and e > 0. By the uniform qualitative robustness 
of {Sn)n(m in W(Po)) there exists n G IN such that for all e > there exists 
5 > such that 

c?bl(Q,P)<5 sup sup rfBL(£m(5;Q),ii™(5;P)) <£. (4.19) 

m>n PgW{Po) 

Define 5i := 5/2. Due to Theorem 9 for the special case M = 1 and by (4.11), 
we have, for all e > 0, 

lim sup Pr*fsuprfBL(Pm,P) > = 0. (4.20) 

"^°°Pg7Mi (2,6(2)) V>n ^ 

Hence (4.19) and Varadarajan's theorem on the almost sure convergence of 
empirical measures to a Borel probability measure defined on a separable 
metric space, see e.g. [7, Thm. 11.4.1, p. 399], yields for the empirical distri- 
butions Q.„ from Q and Po,n from Pq that, 

3ni > nVn > rii : (iBL(Q,Po) < ^ '^BL(Qn, Po,n) < 5 almost surely. 

(4.21) 

It follows from the uniform qualitative robustness of {Sn)n(m, see (4.19), that 

3ni G IN V£ > Vn > rii 35 > VP G W(Po) : 

'^bl(Q, P) < 5 ^ dBi.{2.n{S] Qn), £n(5'; Po,n)) < £ almost surely. 

(4.22) 

For notational convenience, we write for the sequences of bootstrap estima- 
tors 

6,n:=£n(^;Qn), 6,n := (5; Pq,™) , n G IN. (4.23) 
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Note that ^i,„, and ^2,n are (measure- valued) random variables with values in 
the set MiCW, -B(>V)). We denote the distribution of by for j E {1, 2} 
and n G IN. Hence (4.22) yields 

dBL{^i,n,^2,n) < £ almost surely for all n > rii (4.24) 

and it follows 

E[rfBL(ei,n, ^2,n)] < Vn > m. (4.25) 

Now an application of an analogon of Strassen's theorem, see Theorem 10, 
yields 

sup c/BL(i:(6,n), £(6,n)) <£ Vn > m, (4.26) 

n>ni 

which completes the proof, because 

£({!,„) = £(£45; Q„)) and £(6,n) = £(£n(5; Po,n)). (4.27) 



4-2. Proofs of the results in Section 3 

Proof of Theorem 8. Proof of part (i). By assumption, {Z,dz) is a com- 
pact metric space, where Z = X x y. Let B{Z) be the Borel cx-algebra 
on Z. It is well-known that the bounded Lipschitz metric (Ibl metrizes the 
weak topology on the space Aii{Z,B{Z)), see [7, Thm. 11.3.3], and that 
{Aii{Z,B{Z)),dBL) is a compact metric space if and only if {Z,dz) is a 
compact metric space, see [14, p. 45, Thm. 6.4]. From the compactness of 
{Jlii{Z,B{Z)),dBL), it of course follows that this metric space is separable 
and totally bounded, see [5, Thm. 1.4.26]. 

Under the assumptions of the theorem we have, for all fixed A G (0, oo), that 
the SVM operator S : MiiZ,B{Z)) H, S{F) = /l*,p,a, is well-defined 
because it exists and is unique, see [2, Thm. 5, Thm. 6] and is continuous with 
respect to the combination of the weak topology on Aii{Z,B{Z)) and the 
norm topology on H, see [11, Thm. 3.3, Cor. 3.4]. There it was also shown 
that the operator S : Aii{Z, B{Z)) — )■ Cf,(Z), P ^ fL*,p,x, is continuous with 
respect to the combination of weak topology on Aii{Z, B{Z)) and the norm 
topology on Cb{Z). Because {Aii{Z,B{Z)),dBL) is a compact metric space, 
the operators S and S are therefore even uniformly continuous on the whole 
space Jlii{Z, B{Z)) with respect to the mentioned topologies, see [5, Prop. 
1.5.9]. 
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Because the reproducing kernel Hilbert space W := H is a, Hilbert space, H 
is complete. Furthermore, because the input space X is separable and the 
kernel k is continuous, the RKHS H is also separable, see [15, Lem. 4.33]. 
Therefore, Theorem 2 yields that the sequence of if-valued statistics 

1 " 

5„((Xi, Fi), . . . , = argmin - L*(X„ F„ /TO)+A ||/||^ , n G IN, 

(4.28) 

is uniformly qualitatively robust in a neighborhood W(Po) for every proba- 
bility measure Pq G Aii{Z). Now we apply Theorem 3, which yields that the 
sequence {£,n{S]Pn))nm of bootstrap SVM estimators of 2n{S;P) is quali- 
tatively robust for all Pq G J\4i{Z, B{Z)), which gives the first assertion of 
the theorem. 

Proof of part (ii). The proof consists of two steps. In Step 1 the continuity of 
the SVM risk functional R will be shown. In Step 2, the Theorems 2 and 3 
will be used to show that the sequence (-Cn(-R; Pn))nGiN, n G W, of bootstrap 
SVM risk estimators is qualitatively robust. 

Step 1. We will first show that the SVM risk functional R: Mi{Z, B{Z)) 
R is continuous with respect to the combination of the weak topology on 
J\Ai{Z, B{Z)) and the standard topology on R. 

As mentioned in part (i), the assumption that [Z^dz) is a compact metric 
space implies that {Aii{Z, B{Z)), c^bl) is a compact metric space and hence 
this space is separable and totally bounded. 

Under the assumptions of the theorem, the SVM operator 5* : A^i(Z, B{Z)) 
H, S(P) = /l*,p,A5 is well-defined because 5'(P) exists and is unique for all 
P G MiiZ,B{Z)) and for all A G (0,oo), see [2, Thm.5, Thm.6]. Further- 
more, 5* is continuous with respect to the combination of the weak topology 
on Jlii{Z, B{Z)) and the norm topology on H, see [11, Thm. 3.3]. Hence the 
function 

gp:Xxy^R, gj,{x, y) := L*(x, y, S{Y>){x)) = L^{x, y, fi^M^)) 

(4.29) 

is well-defined. Because the kernel k is bounded and continuous, all functions 
f E H, and hence in particular 5'(P) = /l*,p,a G H, are continuous, see e.g. 
[15, Lem. 4.28, Lem. 4.29]. Hence the function gp is continuous (with respect 
to (x, y)), because the loss function L and hence the shifted loss function 
L*{x,y,t) = L{x,y,t) — L{x,y,0), {x,y,t) G X x y x R, are continuous. 
Furthermore, the function gp is bounded, because {Z, dz) with Z := X xy is 
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by assumption a compact metric space, the Lipschitz continuous loss function 
L maps from x 3^ x E to [0,oo), and ||5'(P)||oo < ^l-^li II^IIL < see 
[2, p. 314, (17)]. Hence g-p G Cf,(Z, R,). Because the bounded Lipschitz metric 
d-Bh metrizes the weak topology on M.i[Z, B{Z)), it follows that 



V£i>0 3(5i>0: (iBL(Q,P)<5i =^ I gpdQ- I gpdP <ei 

(4.30) 

Recall that S : Aii{Z, B{Z)) H is continuous with respect to the combi- 
nation of the weak topology on M.i{Z, B{Z)) and the norm topology on H, 
see [11, Thm. 3.3]. Hence 

Ve2>0352>0: rfBL(Q,P)<52 =^ ||S(Q)-^(P)||^<£2. (4.31) 

Fix £ > 0. Define 

ei := ^ and 62 



3|L|i 

Using the triangle inequality in (4.33), the definition of the shifted loss func- 
tion L* in (4.34), the definition of the function gp in (4.35), the Lipschitz 
continuity of L in (4.36), and the well-known formula 

||/||oo< ||A;||oo 11/11^, feH, (4.32) 
see e.g. [15, p. 124] we obtain that (iBL(Q,P) < ^2 implies 
\R{q)-R{V)\ 

L\x,y,S{q){x))dQ{x,y)- [ L\x,y, S{P){x)) dP{x,y) 



< 



L*{x,y,S{q){x))dq{x,y)- L%x,y, S{P){x)) dq{x,y) (4.33) 



L^ix, y, SiP){x)) dq{x, y) - / L\x, S{P){x)) dP{x, y) 



|L(x, S{q){x))-L{x, y, S{P){x))\ dq{x, y) 



gpdq- I gp dP 



< 



^''f |L|i||5(Q)-5(P)|U + £i 



(4.32) 

< l^ll 
(4.31) , , 



l|5(Q)-5(P)ll 



H 



< 



|1 \\l^\\oo£2 ^1 ^ 3 ^' 



(4.34) 

(4.35) 

(4.36) 
(4.37) 
(4.38) 
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Hence, R is continuous with respect to the combination of the weak topology 
on A^i(Z, B{Z)) and the standard topology on (R, B). 
Step 2. Because (A^i(Z, ^bl) is a compact metric space and the risk 

functional R : M.i{Z^ B{Z)) — )■ R is continuous, R is even uniformly contin- 
uous with respect to the mentioned topologies, see [5, Prop. 1.5.9]. Obviously 
(W, dy\;) := (R, I ■ I) is a complete separable metric space. Therefore, Theorem 
2 yields that the sequence of R-valued statistics 

1 " 

i=l 

where fL*,D,x '■= argminjg// - L*{X„ Yj, f{X,)) + A 11/11^, is uniformly 

qualitatively robust in a neighborhood W(Po) for every probability measure 
Po G Aii{Z). Now we apply Theorem 3, which yields that the sequence 
P„) of bootstrap SVM estimators of il„(i?;P) is qualitatively robust 
for all Pq G M.i{Z, B{Z)), which completes the proof. ■ 
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